Dear Reddit community,
Three years ago, we announced the Beta release of SpeechBrain in this subreddit. Today, we are thrilled to announce the official release of SpeechBrain 1.0. This milestone comes with numerous enhancements and advancements from the previous versions.
This is a real community effort (200k downloads on PyPi per month), and we are really happy and proud to lead it.
You can explore a comprehensive summary of our improvements here: SpeechBrain 1.0 Summary.
Among the changes, we have made significant improvements in speech recognition, enhancing search functionalities through integration with K2 for finite-state transducers, CTC decoding, n-gram rescoring, and LLM integration. Additionally, we have introduced novel models such as Streamable Conformer Transducers, Branchformers, and Hyper-conformer, among others, to improve performance and speed. You can now also easily use large-language models (LLMs) and fine-tune them with our data, or simply employ them for rescoring ASR hypotheses.
Furthermore, SpeechBrain now supports a broader spectrum of tasks for speech, audio, text, and EEG processing. We improved our integration with HF models to make importing any model from HF easier. We implemented modern techniques and models, including continual learning, diffusion models, hyper-networks, Bayesian ASR, and more.
We have created a new benchmark repository featuring useful benchmarks for self-supervised learning (MP3S), EEG processing (SpeechBrain-MOABB), and continual learning of new languages (CL-MASR).
Many more novelties can be found on the provided Colab.
Stay tuned for the future because we have big plans ahead.
Of course, huge thanks to our generous sponsors HuggingFace, OVHCloud, and ViaDialog, and our partners at Concordia, Avignon, Mila, Cambridge University, and Samsung, as well as all our amazing contributors!
Regards,
The SpeechBrain Core Team
submitted by /u/TParcollet
[link] [comments]